Read Alignment

Variant Calling

Column

Overview

  • GATK4 best practices
  • Site level filtering:
  • bi-allelic SNPs
  • 95% truth sensitivity (VQSR)
  • Genotype level filtering:
  • DP: 4 - 77 (mean DP + 4*SD)
  • GQ >= 20
  • Else missing (./.)

Number of Variants

variable raw site_filtered gt_filtered
TOTAL_SNPS 16,643,754.00 9,115,052.00 5,126,585.00
NUM_IN_DB_SNP 12,117,419.00 8,486,322.00 4,846,921.00
NOVEL_SNPS 4,526,335.00 628,730.00 279,664.00
PCT_DBSNP 0.73 0.93 0.95
DBSNP_TITV 1.94 2.07 2.30
NOVEL_TITV 1.07 1.53 1.82
TOTAL_INDELS 3,454,237.00 0.00 0.00
TOTAL_MULTIALLELIC_SNPS 856,605.00 0.00 0.00

Column

SNPs per sample

Sample metrics

Population metrics

Population VCFs were produced from cohort final-VCF by extracting samples and removing fixed reference sites (GT==“RR”) and all-missing sites (GT==“./.”).

Intersections (upsetr)

Missingness

min_called_samples DUK DUC DU6 DU6P DUhLB FZTDU
1 1.00 1.00 1.00 1.00 1.00 1.00
2 1.00 1.00 1.00 1.00 1.00 1.00
3 1.00 1.00 1.00 1.00 1.00 1.00
4 1.00 1.00 1.00 1.00 1.00 1.00
5 1.00 1.00 1.00 1.00 1.00 1.00
6 1.00 1.00 1.00 1.00 1.00 1.00
7 1.00 1.00 1.00 1.00 1.00 1.00
8 1.00 0.99 0.99 1.00 1.00 1.00
9 1.00 0.99 0.99 0.99 1.00 1.00
10 0.99 0.99 0.99 0.99 0.99 1.00
11 0.96 0.99 0.84 0.99 0.97 1.00
12 0.92 0.98 0.66 0.98 0.95 0.99
13 0.88 0.95 0.50 0.96 0.91 0.99
14 0.83 0.91 0.38 0.93 0.86 0.97
15 0.78 0.86 0.27 0.89 0.82 0.94
16 0.73 0.79 0.19 0.85 0.76 0.90
17 0.67 0.70 0.13 0.80 0.71 0.85
18 0.61 0.60 0.08 0.74 0.65 0.79
19 0.54 0.49 0.05 0.67 0.59 0.72
20 0.46 0.38 0.03 0.58 0.52 0.65
21 0.38 0.26 0.01 0.47 0.45 0.57
22 0.29 0.16 0.01 0.34 0.37 0.50
23 0.20 0.08 0.00 0.21 0.28 0.40
24 0.11 0.03 0.00 0.09 0.18 0.29
25 0.03 0.01 0.00 0.03 0.08 0.14

Figures produced externally from site-filtered biSNPs with script “./batches123_04_FinalVCF/scripts/15_visualize_missingness.R”.

More about GT-filtering

min_N_non_miss_per_group N_SNPs
10 8875347
11 7384239
12 5794507
13 4415506
14 3279141
15 2364751
16 1645183
17 1091424
18 683508
19 396131
20 206405
21 93424
22 34033
23 9207
24 1277
25 32
  • If 1/6 groups with min 20/25 non-miss samples: 7064770 SNPs

  • If 6/6 groups with min 20/25 non-miss samples: 34033 SNPs

SNP Annotations

LDD & SFS

Column

Allele frequency state per population

LD Decay

Column

Alternative Allele Frequency Distribution

Minor Allele Frequency Distribution

Diversity

Genetic Structure

Column

PC1 - PC2

PC2 - PC3

PC3 - PC4

PC4 - PC5

PC5 - PC6

Scree Plot

Column

Hierarchical Clustering

Admixture

Admixture detailed

Column

K2

K3

Column

K4

K5

Genetic Differentiation

Column

z-Fst Histogram

Column

z-Fst Boxplots

Directional Signatures